Data Flow - ChartsMaze EDL Pipeline

The EDL Pipeline transforms raw API responses into a unified, enriched dataset through a series of orchestrated data transformations. This page traces how data flows from initial API calls through to the final compressed output.

The Central Hub: master_isin_map.json

Every data flow in the pipeline begins or depends on the master ISIN map created in Phase 1.

master_isin_map.json Structure

{
  "RELIANCE": {
    "ISIN": "INE002A01018",
    "Sid": "11915",
    "Symbol": "RELIANCE",
    "Name": "Reliance Industries Limited"
  },
  "TCS": {
    "ISIN": "INE467B01029",
    "Sid": "11536",
    "Symbol": "TCS",
    "Name": "Tata Consultancy Services Limited"
  }
  // ... 2,775 stocks total
}

Key Fields:

ISIN: International Securities Identification Number (used by ALL APIs)
Sid: Security ID (required for OHLCV and advanced indicators)
Symbol: Stock ticker (used for file naming and CSV matching)
Name: Company full name

Why It’s Critical: Every script in Phase 2+ iterates over this map to:

Know which stocks to fetch data for
Match API responses back to symbols
Ensure consistent ISIN → Symbol mapping across all datasets

Phase-by-Phase Data Transformation

Phase 1: Core Data Foundation

1. Market Snapshot

Script: fetch_dhan_data.pyAPI Call:

POST https://ow-scanx-analytics.dhan.co/customscan/fetchdt
{
  "data": {
    "type": "full",
    "whichpage": "nse_total_market",
    "count": 5000
  }
}

Raw Output: dhan_data_response.json (~5 MB)

2,775 stocks with current prices, technical indicators, volume

Derived Output: master_isin_map.json

Extracted ISIN, Sid, Symbol, Name for all stocks

2. Fundamental Data

Script: fetch_fundamental_data.pyAPI Calls: One per stock (2,775 requests)

POST https://open-web-scanx.dhan.co/scanx/fundamental
{"data": {"isin": "INE002A01018"}}

Output: fundamental_data.json (~35 MB)

Quarterly results (Net Profit, EPS, Sales, OPM)
Annual results (5 years history)
Balance sheet data
Shareholding patterns
Valuation ratios (ROE, ROCE, P/E)

3. Listing Dates

Source: NSE Archives CSV downloadOutput: nse_equity_list.csv

Symbol → Listing Date mapping

Data Available After Phase 1:

2,775 ISINs mapped to symbols
Current market data (prices, volumes, RSI)
5 years of quarterly fundamentals
Listing dates

Phase 2: Data Enrichment (Parallel Fetching)

All scripts in this phase run independently using master_isin_map.json. They can execute in any order (or in parallel).

Company Filings
Market News
Advanced Indicators
Corporate Actions
Other Fetchers

fetch_company_filings.py

Strategy: Hybrid dual-endpoint fetchingAPI Calls: 2 per stock × 2,775 = 5,550 requestsDeduplication Logic:

By news_id + news_date + caption
Keeps most recent 100 filings per stock

Output Structure:

// company_filings/RELIANCE_filings.json
[
  {
    "news_id": "123456",
    "news_date": "2024-01-15",
    "caption": "Reg. 7(2) - Outcome of Board Meeting",
    "pdf_url": "https://..."
  }
]

fetch_market_news.py

Per-Stock News Fetching:

POST https://news-live.dhan.co/v2/news/getLiveNews
{
  "stock_list": ["INE002A01018"],
  "limit": 50,
  "categories": ["ALL"]
}

Output: market_news/{SYMBOL}_news.json

Top 50 news items per stock
AI sentiment classification (positive/negative/neutral)
Timestamp, headline, source

Threading: 15 concurrent requests for speed

fetch_advanced_indicators.py

Requires: Sid (Security ID) from master_isin_map.json

POST https://ow-static-scanx.dhan.co/staticscanx/indicator
{
  "security_id": "11915",
  "isin": "INE002A01018",
  "symbol": "RELIANCE",
  "minute": "D"
}

Output: advanced_indicator_data.json (~8.3 MB)

Pivot Points (Classic, Fibonacci, Camarilla)
SMA signals (20, 50, 200)
EMA signals (20, 50, 200)
Technical sentiment (RSI, MACD actions)

Threading: 50 concurrent requests

fetch_corporate_actions.py

Two Time Windows:

History (2 years back):

{
  "filters": [{"field": "CorpAct.ActDate", "op": "GT", "val": "2022-01-01"}],
  "count": 5000
}

Upcoming (2 months ahead):

{
  "filters": [{"field": "CorpAct.ActDate", "op": "LT", "val": "2024-03-01"}],
  "count": 5000
}

Outputs:

history_corporate_actions.json (dividends, bonuses, splits already executed)
upcoming_corporate_actions.json (scheduled events)

Event Types:

QUARTERLY RESULTS
DIVIDEND
BONUS
SPLIT
RIGHTS ISSUE

Surveillance, Circuits, Deals

fetch_surveillance_lists.py:

nse_asm_list.json (ASM stocks with stage: LTASM/STASM)
nse_gsm_list.json (GSM stocks)

fetch_circuit_stocks.py:

upper_circuit_stocks.json (stocks hitting upper circuit)
lower_circuit_stocks.json (stocks hitting lower circuit)

fetch_bulk_block_deals.py:

bulk_block_deals.json (30 days of bulk/block deals)
Auto-pagination through all pages

fetch_incremental_price_bands.py:

incremental_price_bands.json (daily price band revisions)

fetch_complete_price_bands.py:

complete_price_bands.json (all securities with current bands)

Data Available After Phase 2:

100 regulatory filings per stock
50 news items per stock
Technical indicators (Pivots, SMA/EMA)
2 years corporate action history + 2 months upcoming
Surveillance flags
Circuit breaker status
Bulk/block deals
Price band revisions

Phase 2.5: OHLCV Data (Incremental Download)

fetch_all_ohlcv.py Flow

API Call (per stock):

POST https://openweb-ticks.dhan.co/getDataH
{
  "SYM": "RELIANCE",
  "SEC_ID": "11915",
  "INTERVAL": "D",
  "START": 215634600,  // Oct 31, 1976 (forces max history)
  "END": 1709481600    // Current timestamp
}

Smart Incremental Logic:

Check if ohlcv_data/RELIANCE.csv exists
If yes: Read last date, set START to last date + 1 day
If no: Download from 1976 (full history)

Performance:

First-time: ~30 minutes (2,775 stocks × full history)
Incremental: ~2-5 minutes (only new dates)

Output CSV Format:

Time,Open,High,Low,Close,Volume
1609459200,2345.50,2367.80,2340.00,2360.75,12500000
1609545600,2365.00,2380.20,2350.10,2375.40,11800000

Data Available After Phase 2.5:

Daily OHLCV data for all stocks (from listing date to today)
~2,775 CSV files in ohlcv_data/ directory

Phase 3: Base Analysis (Creating Master JSON)

bulk_market_analyzer.py Transformation

Inputs:

fundamental_data.json → Financial metrics
dhan_data_response.json → Current prices, technical indicators
advanced_indicator_data.json → Pivots, SMA/EMA signals
nse_equity_list.csv → Listing dates

Processing Steps:Key Transformations:

Quarterly Metrics Extraction:
- Raw: "NET_PROFIT": "1250.5|1180.2|1090.8|1050.3|1100.1"
- Extracted:
  - Net Profit Latest Quarter: 1250.5
  - Net Profit Previous Quarter: 1180.2
  - Net Profit Last Year Quarter: 1100.1
- Calculated:
  - QoQ % Net Profit Latest: ((1250.5 - 1180.2) / 1180.2) × 100 = 5.96%
  - YoY % Net Profit Latest: ((1250.5 - 1100.1) / 1100.1) × 100 = 13.67%
Valuation Ratios:
- D/E Ratio: Non-Current Liabilities / Total Equity
- PEG Ratio: P/E / YoY EPS Growth
- Forward P/E: P/E × (TTM EPS / Annualized Latest EPS)
Shareholding Changes:
- Raw: "FII": "25.3|24.1"
- Calculated: FII % change QoQ: 25.3 - 24.1 = 1.2%
- Free Float: 100 - Promoter%
- Float Shares: Total Shares × (Free Float / 100)
Technical Indicator Parsing:
- SMA Status: “SMA 20: Above (4.9%) | SMA 50: Above (24.1%)”
- EMA Status: “EMA 20: Above (6.3%) | EMA 200: Above (72.6%)”
- Technical Sentiment: “RSI: Neutral | MACD: Bearish”
Index Membership:
- Filters tech.idxlist for specific indices (Nifty 50, Bank Nifty, etc.)
- Comma-separated list: “NIFTY 50, NIFTY BANK, NIFTY 100”

Output Structure (60+ fields per stock):

[
  {
    "Symbol": "RELIANCE",
    "Name": "Reliance Industries Limited",
    "Listing Date": "29-NOV-1977",
    "Basic Industry": "Refineries",
    "Sector": "Energy",
    "Market Cap(Cr.)": 1700000,
    "Latest Quarter": "Dec-2023",
    "Net Profit Latest Quarter": 18500,
    "QoQ % Net Profit Latest": 5.2,
    "YoY % Net Profit Latest": 12.3,
    // ... 50+ more fields
  }
]

Data Available After Phase 3:

Base JSON with 60+ fields for all 2,775 stocks
Identity, Fundamentals, Valuation, Ownership, Technical indicators
Ready for in-place enrichment in Phase 4

Phase 4: Enrichment Injection (Sequential Modifications)

Each script in this phase reads all_stocks_fundamental_analysis.json, modifies it in-place, and writes it back.

Critical: These scripts must run in exact order because later scripts depend on fields added by earlier ones.

1. Advanced Metrics (OHLCV-based)

Script: advanced_metrics_processor.pyReads: ohlcv_data/{SYMBOL}.csv for each stockCalculations:

# ATH (All-Time High)
ath = df['High'].max()
pct_from_ath = ((ath - latest_close) / ath) * 100

# ADR (Average Daily Range)
df['Daily_Range_Pct'] = ((df['High'] - df['Low']) / df['Low']) * 100
adr_5 = df['Daily_Range_Pct'].tail(5).mean()
adr_14 = df['Daily_Range_Pct'].tail(14).mean()
adr_20 = df['Daily_Range_Pct'].tail(20).mean()
adr_30 = df['Daily_Range_Pct'].tail(30).mean()

# RVOL (Relative Volume)
avg_vol_20 = df['Volume'].tail(21).iloc[:-1].mean()
rvol = latest_volume / avg_vol_20

# Gap Up %
gap_up = ((latest_open - prev_close) / prev_close) * 100

# Turnover (Rupee Volume)
df['Turnover_Cr'] = (df['Close'] * df['Volume']) / 10000000
turnover_20 = df['Turnover_Cr'].tail(20).mean()

Fields Added (15 fields):

ATH, % from ATH
5/14/20/30 Days MA ADR(%)
RVOL
Gap Up %, Day Range %
% from 52W Low
6 Month Returns(%)
200 Days EMA Volume
% from 52W High 200 Days EMA Volume
Daily Rupee Turnover 20/50/100(Cr.)
30 Days Average Rupee Volume(Cr.)

2. Earnings Performance

Script: process_earnings_performance.pyLogic:

Read company_filings/{SYMBOL}_filings.json
Find most recent “Quarterly Results” filing
Extract date and closing price on that day from OHLCV
Calculate returns from earnings day to current price
Find max price since earnings to calculate peak returns

Pseudocode:

results_date = find_latest_quarterly_results_filing(symbol)
results_close = ohlcv_df.loc[results_date, 'Close']
current_price = ohlcv_df.iloc[-1]['Close']

returns = ((current_price - results_close) / results_close) * 100

max_price_since = ohlcv_df[results_date:]['High'].max()
max_returns = ((max_price_since - results_close) / results_close) * 100

Fields Added (3 fields):

Quarterly Results Date
Returns since Earnings(%)
Max Returns since Earnings(%)

3. F&O Data Enrichment

Script: enrich_fno_data.pyReads:

fno_lot_sizes_cleaned.json (lot size mapping)
fno_expiry_calendar.json (next expiry dates)
fno_stocks_response.json (F&O stock list)

Logic:

If symbol in F&O list → set FNO Flag: Yes
Look up lot size from mapping
Find next expiry date from calendar

Fields Added (3 fields):

FNO Flag (Yes/No)
Lot Size
Next Expiry (date)

4. Market Breadth & Relative Strength

Script: process_market_breadth.pyCalculation:

Uses return data already in base JSON
Computes relative strength rating (1-100)
Generates market breadth statistics

Fields Added:

Relative Strength Rating
Market breadth percentile

5. Historical Market Breadth

Script: process_historical_market_breadth.pyOutput: Separate time-series file for charting (not added to master JSON)

6. Corporate Events & News (FINAL)

Script: add_corporate_events.pyAggregation Strategy:Event Markers Logic:

# Surveillance
if symbol in asm_list and "LTASM" in stage:
    events.append("★: LTASM")

# Upcoming Corporate Actions (within 30 days)
for action in upcoming_actions:
    if action['Symbol'] == symbol:
        if "DIVIDEND" in action['Type']:
            events.append(f"💸: Dividend ({action['Date']})")
        elif "BONUS" in action['Type']:
            events.append(f"🎁: Bonus ({action['Date']})")
        elif "RESULTS" in action['Type'] and within_14_days:
            events.append(f"⏰: Results ({action['Date']})")

# Block Deals (last 7 days)
if symbol in recent_deals:
    events.append("📦: Block Deal")

# Price Band Revision
if symbol in circuit_revisions:
    events.append("#: +/- Revision")

Recent Announcements (Top 5):

filings = load_filings(f"company_filings/{symbol}_filings.json")
top_5 = sorted(filings, key=lambda x: x['news_date'], reverse=True)[:5]

announcements = [
    {
        "Date": filing['news_date'],
        "Headline": filing['caption'],
        "URL": filing['pdf_url']
    }
    for filing in top_5
]

News Feed (Top 5):

news = load_news(f"market_news/{symbol}_news.json")
top_5 = sorted(news, key=lambda x: x['timestamp'], reverse=True)[:5]

news_feed = [
    {
        "Title": item['headline'],
        "Sentiment": item['sentiment'],  # positive/negative/neutral
        "Date": item['date']
    }
    for item in top_5
]

Fields Added (3 compound fields):

Event Markers (array of icons/labels)
Recent Announcements (array of 5 objects)
News Feed (array of 5 objects)

Data Available After Phase 4:

Complete JSON with all 86 fields for all 2,775 stocks
Ready for compression

Phase 5: Compression

Simple gzip compression of the final JSON:

import gzip

with open("all_stocks_fundamental_analysis.json", "rb") as f_in:
    data = f_in.read()
    
with gzip.open("all_stocks_fundamental_analysis.json.gz", "wb", compresslevel=9) as f_out:
    f_out.write(data)

Compression Results:

Raw JSON: ~38 MB
Compressed: ~7.5 MB
Ratio: 80% reduction

Final Output Structure

Complete JSON Schema

[
  {
    // ─── Identity (6 fields) ───
    "Symbol": "RELIANCE",
    "Name": "Reliance Industries Limited",
    "Listing Date": "29-NOV-1977",
    "Basic Industry": "Refineries",
    "Sector": "Energy",
    "Index": "NIFTY 50, NIFTY 100, NIFTY ENERGY",
    
    // ─── Valuation (7 fields) ───
    "Market Cap(Cr.)": 1700000,
    "Stock Price(₹)": 2540.75,
    "P/E": 28.5,
    "Forward P/E": 26.2,
    "Historical P/E 5": 0.0,
    "PEG": 2.31,
    "% from 52W High": -8.5,
    
    // ─── Fundamentals - Quarterly (32 fields) ───
    "Latest Quarter": "Dec-2023",
    "Net Profit Latest Quarter": 18500,
    "Net Profit Previous Quarter": 17600,
    "QoQ % Net Profit Latest": 5.11,
    "YoY % Net Profit Latest": 12.3,
    // ... (EPS, Sales, OPM with Latest/Previous/2Q/3Q/LastYr)
    
    // ─── Fundamentals - Ratios (5 fields) ───
    "ROE(%)": 15.2,
    "ROCE(%)": 12.8,
    "D/E": 0.45,
    "OPM TTM(%)": 11.5,
    "Sales Growth 5 Years(%)": 8.7,
    
    // ─── Ownership (4 fields) ───
    "FII % change QoQ": 1.2,
    "DII % change QoQ": -0.5,
    "Free Float(%)": 49.5,
    "Float Shares(Cr.)": 336.5,
    
    // ─── Price Performance (6 fields) ───
    "1 Day Returns(%)": 0.8,
    "1 Week Returns(%)": 2.5,
    "1 Month Returns(%)": 4.2,
    "3 Month Returns(%)": 8.5,
    "6 Month Returns(%)": 15.3,
    "1 Year Returns(%)": 28.7,
    
    // ─── Technical Indicators (6 fields) ───
    "RSI (14)": 62.5,
    "SMA Status": "SMA 20: Above (4.9%) | SMA 50: Above (24.1%)",
    "EMA Status": "EMA 20: Above (6.3%) | EMA 200: Above (72.6%)",
    "Technical Sentiment": "RSI: Neutral | MACD: Bearish",
    "Pivot Point": "2485.50",
    "Gap Up %": 0.3,
    
    // ─── Volatility & Volume (11 fields) ───
    "5 Days MA ADR(%)": 2.1,
    "14 Days MA ADR(%)": 2.3,
    "20 Days MA ADR(%)": 2.4,
    "30 Days MA ADR(%)": 2.5,
    "Day Range(%)": 1.8,
    "RVOL": 1.25,
    "200 Days EMA Volume": 8500000,
    "% from 52W High 200 Days EMA Volume": -15.2,
    "Daily Rupee Turnover 20(Cr.)": 1250,
    "Daily Rupee Turnover 50(Cr.)": 1180,
    "Daily Rupee Turnover 100(Cr.)": 1150,
    "30 Days Average Rupee Volume(Cr.)": 1200,
    
    // ─── Historical Metrics (3 fields) ───
    "ATH": 2975.50,
    "% from ATH": -14.6,
    "% from 52W Low": 35.8,
    
    // ─── Earnings (3 fields) ───
    "Quarterly Results Date": "15-Jan-2024",
    "Returns since Earnings(%)": 3.5,
    "Max Returns since Earnings(%)": 8.2,
    
    // ─── F&O Data (3 fields) ───
    "FNO Flag": "Yes",
    "Lot Size": 250,
    "Next Expiry": "28-Mar-2024",
    
    // ─── Circuit Info (1 field) ───
    "Circuit Limit": "20%",
    
    // ─── Event Markers (1 array field) ───
    "Event Markers": [
      "📊: Results Recently Out",
      "💸: Dividend (15-Mar)"
    ],
    
    // ─── Recent Announcements (1 array field) ───
    "Recent Announcements": [
      {
        "Date": "2024-01-15",
        "Headline": "Outcome of Board Meeting - Quarterly Results",
        "URL": "https://www.bseindia.com/..."
      },
      // ... 4 more
    ],
    
    // ─── News Feed (1 array field) ───
    "News Feed": [
      {
        "Title": "Reliance announces new green energy initiative",
        "Sentiment": "positive",
        "Date": "2024-03-02"
      },
      // ... 4 more
    ]
  }
  // ... 2,774 more stocks
]

Total: 86 fields per stock × 2,775 stocks = 238,650 data points

Data Lineage Summary

Next Steps

Output Schema

Detailed breakdown of all 86 fields

Pipeline Architecture

Understand the 6-phase design

API Endpoints

Complete Dhan API endpoint reference

Pipeline Settings

Configure pipeline behavior and flags

OHLCV Configuration

Optimize OHLCV download strategy

​The Central Hub: master_isin_map.json

​Phase-by-Phase Data Transformation

​Phase 1: Core Data Foundation

​Phase 2: Data Enrichment (Parallel Fetching)

​fetch_company_filings.py

​fetch_market_news.py

​fetch_advanced_indicators.py

​fetch_corporate_actions.py

​Surveillance, Circuits, Deals

​Phase 2.5: OHLCV Data (Incremental Download)

​Phase 3: Base Analysis (Creating Master JSON)

​Phase 4: Enrichment Injection (Sequential Modifications)

​Phase 5: Compression

​Final Output Structure

​Data Lineage Summary

​Next Steps

Output Schema

Pipeline Architecture

API Endpoints

Pipeline Settings

OHLCV Configuration

The Central Hub: master_isin_map.json

Phase-by-Phase Data Transformation

Phase 1: Core Data Foundation

Phase 2: Data Enrichment (Parallel Fetching)

fetch_company_filings.py

fetch_market_news.py

fetch_advanced_indicators.py

fetch_corporate_actions.py

Surveillance, Circuits, Deals

Phase 2.5: OHLCV Data (Incremental Download)

Phase 3: Base Analysis (Creating Master JSON)

Phase 4: Enrichment Injection (Sequential Modifications)

Phase 5: Compression

Final Output Structure

Data Lineage Summary

Next Steps